An India soyabean dataset for identification and classification of diseases using computer-vision algorithms

Intelligent agriculture heavily relies on the science of agricultural disease image recognition. India is also responsible for large production of French beans, accounting for 37.25% of total production. In India from south region of Maharashtra state this crop is cultivated thrice in year. Soyabean plant is planted between the months of June through July, during the months of October and September during the rabi season, as well as in February. In the Maharashtrian regions of Pune, Satara, Ahmednagar, Solapur, and Nashik, among others, Soyabean plant is a common crop. In Maharashtra, Soyabean plant is grown over an area of around 31,050 hectares. This research presents a dataset of leaves from soyabean plants that are both insect-damaged and healthy. Images were taken over the course of fewer than two to three seasons on several farms. There are 3363 photos altogether in the seven folders that make up the dataset. Six categories comprise the dataset: I) Healthy plants II) Vein Necrosis III) Dry leaf IV) Septoria brown spot V) Root images VI) Bacterial leaf blight. This study's goal is to give academics and students accessibility to our dataset so they may use it for their studies and to build machine learning models.


Value of the Data
• The dataset presented here is a collection of leaves from Soyabean plant that were gathered using mobile devices.• Researchers as well as learners from many fields can use the dataset, which comprises of 1500 processed photos [ 2 ].Researchers may utilise the dataset to review and validate the data as needed using various Predictive model, and to evaluate the precision of the algorithms.• The dataset is a freely downloadable open source that is accessible to the general audience.
So, without performing any additional pre-processing or confirmation, researchers may train the machine learning model using this dataset.• The information may be used to develop high-quality tools for identifying and categorising diseases in Soyabean plant leaves that benefit society [ 3 ].

Objectives
(a) A dataset with several disease classifications present on Soyabean plant leaves can aid AI/ML algorithms in real-time illness detection and classification.(b) Pre-processing a dataset can help an AI/ML model perform more accurately.

Data Description
The six classifications that make up this dataset are healthy, vein necrosis, dry leaf, Septoria brown spot, root images and bacteria leaf blight.The first folder has 288 healthy images (Single leaf image and multi-image leaf).The second folder has 138 images of vein necrosis (Single leaf image and multi-image leaf).The dry leaf is in the third folder with 230 images (Single leaf image and multi-image leaf).Fourth folder contains 284 images of Septoria brown spot.The fifth folder contain 10 images of root.The sixth folder 226 images of bacteria leaf blight, while leaf images all(raw) is the last folder with 2187 images ( Table 1 ).In this part, we examine the unusual symptoms of several diseases identified in our dataset's leaf photos.Examples of each ailment and the healthy group are shown in Fig. 1 .
Vein necrosis is brought on by a fungus that needs water on the surface of leaves to flourish, thus watering at the plant's base will help eliminate moisture on the leaves.
One of the most destructive diseases of the common bean in tropical and subtropical production zones is Bacterial leaf spot (BLS), which is brought on by the bacterium pheudocercospora griseola.
Common Septoria brown spot illness fusarium wilt has symptoms that resemble verticillium wilt.Yellowing, stunting, and deadness of seedlings are among the symptoms, as are yellowing and stunting of older plants ( Fig. 2 ).

Experimental Design, Materials and Methods
Images from smartphones were taken in July 2022 from a small village called Goudgaon, Tal: Barshi, Dist: Solapur, in the Maharashtra area.Since the timeframe is ideal for Soyabean plant in the area, the procedure of taking the pictures took place at that time.Plants are photographed under sunny conditions at various stages.
Three stages made up the pre-processing of the photos.
(1) Data Acquisition: The photographs were taken using the high-quality back camera of a smart phone.3363 pictures were all taken using a camera, sorted, and stored to the appropriate folder as shown in Fig. 3 .
(2) Image size: In this phase, the images of different size collected from a village goudgaon situated in Maharashtra, India.
(3) Dataset split: Separating a dataset into test and training sets to assess how well a machine learning model works.Splitting is necessary to solve the issue of overfitting.
The first phases of pre-processing are to arrange the images into six folders for the classification purpose: 1) Healthy 2) Vein necrosis 3) Dry leaf 4) Septoria brown spot 5) Root images 6) Bacteria leaf blight.
The second step is one of the most crucial since photographs are taken using smartphones, varying in size from 1600 × 1200 pixels in width and height to 96 dots per inch.We use the resize () function from the Python programming language to retain the usual picture size of 300 * 30 pixels.The dimensions of the photographs are being adjusted to be the same.To rotate the picture, zoom, alter the brightness range, and perform other operations on the image, we utilise the ImageDataGenerator function from keras.preprocessing.
The third phases are to split the dataset for training and testing as shown in Fig. 4 .

Table 1
Types of disease and number of images.